A Lightweight Distributed Order and Duplication Insensitive Algorithm for Approximate Top-k Queries using Order Statistics

نویسندگان

  • Vinay Deolalikar
  • Kave Eshghi
  • Hernan Laffitte
چکیده

1. APPROXIMATE TOP-K Let {e1, e2, . . . , el} be a set of distinct records in a database, with unique IDs {id1, id2, . . . , idl}. Let A1, A2, . . . , Ap be a set of distinct attributes for each record. For every record ei, the attribute Aj is zero or some positive value. We denote the value of the attribute Aj of record ei by Aj(ei). The sum of the attributes of ei is denoted by Ni = ∑ j Aj(ei). We would like to obtain the list of top k records, ordered by Ni. We present a highly configurable, lightweight, distributed algorithm to solve the above problem approximately, based on order statistics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unified Framework for Top-k Query Processing in Peer-to-Peer Networks

Supporting queries over dispersed data stored in large-scale distributed systems, such as peer-to-peer networks, naturally calls for ranked retrieval in order to effectively focus on the most relevant (i.e., top-k) results. While top-k retrieval has been actively studied lately, existing algorithms are too restrictive due to their assumptions about how the data is partitioned amongst the variou...

متن کامل

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data Technical Report

For effectively searching the Web of data, ranking of results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. Top-k strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus...

متن کامل

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

For effectively searching the Web of data, ranking of results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. Top-k strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus...

متن کامل

Top-k Query Evaluation with Probabilistic Guarantees

Martin Theobald, Gerhard Weikum, Ralf Schenkel Max-Planck Institute of Computer Science D-66123 Saarbruecken, Germany {mtb, weikum, schenkel}@mpi-sb.mpg.de Abstract Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin’s thresho...

متن کامل

Finding Top-k Approximate Answers to Path Queries

We consider the problem of finding and ranking paths in semistructured data without necessarily knowing its full structure. The query language we adopt comprises conjunctions of regular path queries, allowing path variables to appear in the bodies and the heads of rules, so that paths can be returned to the user. We propose an approximate query matching semantics which adapts standard notions o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012